Visibility metrics in multi-view medical activity recognition systems and methods

ABSTRACT

Visibility metrics in multi-view medical activity recognition systems and methods are described herein. In certain illustrative examples, a system access imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints. The system determines, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor. The system facilitates, based on the value of the activity visibility metric, adjusting the first viewpoint of the first sensor.

RELATED APPLICATIONS

The present application claims priority to U.S. Provisional Patent Application No. 63/141,830, filed Jan. 26, 2021, and to U.S. Provisional Patent Application No. 63/141,853, filed Jan. 26, 2021, and to U.S. Provisional Patent Application No. 63/113,685, filed Nov. 13, 2020, the contents of which are hereby incorporated by reference in their entirety.

BACKGROUND INFORMATION

Computer-implemented activity recognition typically involves capture and processing of imagery of a scene to determine characteristics of the scene. Conventional activity recognition may lack a desired level of accuracy and/or reliability for dynamic and/or complex environments. For example, some objects in a dynamic and complex environment, such as an environment associated with a surgical procedure, may become obstructed from the view of an imaging device.

SUMMARY

The following description presents a simplified summary of one or more aspects of the systems and methods described herein. This summary is not an extensive overview of all contemplated aspects and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present one or more aspects of the systems and methods described herein as a prelude to the detailed description that is presented below.

An illustrative system includes a memory storing instructions and a processor communicatively coupled to the memory and configured to execute the instructions to access imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determine, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor; and facilitate, based on the value of the activity visibility metric, adjusting the first viewpoint of the first sensor.

An illustrative method includes accessing, by a processor, imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determining, by the processor, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor; and facilitating, by the processor, based on the value of the activity visibility metric, adjusting the first viewpoint of the first sensor.

An illustrative non-transitory computer-readable medium stores instructions executable by a processor to access imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determine, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor, and facilitate, based on the value of the activity visibility metric, adjusting the first viewpoint of the first sensor.

An illustrative system includes a memory storing instructions and a processor communicatively coupled to the memory and configured to execute the instructions to access imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determine, based on the first imagery, a first classification of an activity of the scene; determine, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor; determine that the value of the activity visibility metric for the first sensor is below a threshold value of the activity visibility metric; and lower, based on the determining that the value of the activity visibility metric for the first sensor is below the threshold value of the activity visibility metric, a weighting of the first classification of the activity of the scene for determining an overall classification of the activity of the scene based on the imagery of the scene.

An illustrative method includes accessing, by a processor, imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determining, by the processor, based on the first imagery, a first classification of an activity of the scene; determining, by the processor, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor; determining, by the processor, that the value of the activity visibility metric for the first sensor is below a threshold value of the activity visibility metric; and lowering, by the processor, based on the determining that the value of the activity visibility metric for the first sensor is below the threshold value of the activity visibility metric, a weighting of the first classification of the activity of the scene for determining an overall classification of the activity of the scene based on the imagery of the scene.

An illustrative non-transitory computer-readable medium storing instructions executable by a processor to access imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determine, based on the first imagery, a first classification of an activity of the scene; determine, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor; determine that the value of the activity visibility metric for the first sensor is below a threshold value of the activity visibility metric; and lower, based on the determining that the value of the activity visibility metric for the first sensor is below the threshold value of the activity visibility metric, a weighting of the first classification of the activity of the scene for determining an overall classification of the activity of the scene based on the imagery of the scene.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings illustrate various embodiments and are a part of the specification. The illustrated embodiments are merely examples and do not limit the scope of the disclosure. Throughout the drawings, identical or similar reference numbers designate identical or similar elements.

FIG. 1 depicts an illustrative multi-view medical activity recognition system according to principles described herein.

FIG. 2 depicts an illustrative processing system according to principles described herein.

FIGS. 3-4 depict illustrative multi-view medical activity recognition systems according to principles described herein.

FIG. 5 depicts an illustrative computer-assisted robotic surgical system according to principles described herein.

FIG. 6 depicts an illustrative configuration of imaging devices attached to a robotic surgical system according to principles described herein.

FIGS. 7-8 depict illustrative methods according to principles described herein.

FIG. 9 depicts an illustrative computing device according to principles described herein.

DETAILED DESCRIPTION

Systems and methods for in multi-view medical activity recognition are described herein. An activity recognition system may include multiple sensors that include at least two imaging devices configured to capture imagery of a scene from different viewpoints. The activity recognition system may determine for the imagery captured by each of the imaging devices one or more activity visibility metrics that represent a visibility of an activity of a scene captured in the imagery. Based on the values of the activity visibility metrics, the activity recognition system may facilitate adjusting one or more of the viewpoints of one or more of the imaging devices, such as to capture, from an additional viewpoint, additional imagery that has a higher value of the activity visibility metric than the initial imagery. Additionally or alternatively, the activity recognition system may use the values of the activity visibility metrics to determine classifications of the activity of the scene.

In certain examples, the scene may be of a medical session such as a surgical session, and activities may include phases of the surgical session. During the medical session, imagery of the scene may be captured by multiple imaging devices. Values of the activity visibility metrics may be determined based on both the content of the imagery and the activity within the medical session. When the values of the activity visibility metrics indicate that an imaging device has a suboptimal view of the activity of the scene, the activity recognition system may provide output configured to facilitate a change to a pose of the imaging device to capture imagery that has a better view of the activity. Thus, the activity recognition system may dynamically adjust the configuration of the imaging devices to capture imagery from viewpoints that optimize visibility of the activity of the scene.

Systems and methods described herein may provide various advantages and benefits. For example, systems and methods described herein may provide accurate, dynamic, and/or flexible activity recognition. Illustrative examples of activity recognition described herein may be more accurate than conventional activity recognition that is based on single-sensor activity recognition or fixed multi-sensor activity recognition. Illustrative examples of systems and methods described herein may be well suited for activity recognition of dynamic and/or complex scenes, such as a scene associated with a medical session.

Various illustrative embodiments will now be described in more detail. The disclosed systems and methods may provide one or more of the benefits mentioned above and/or various additional and/or alternative benefits that will be made apparent herein.

FIG. 1 depicts an illustrative multi-view medical activity recognition system 100 (“system 100”). As shown, system 100 may include multiple sensors (e.g., imaging devices 102-1 and 102-2, collectively “imaging devices 102”) positioned relative to a scene 104. Imaging devices 102 may be configured to image scene 104 by concurrently capturing images of scene 104.

Scene 104 may include any environment and/or elements of an environment that may be imaged by imaging devices 102. For example, scene 104 may include a tangible real-world scene of physical elements. In certain illustrative examples, scene 104 is associated with a medical session such as a surgical procedure. For example, scene 104 may include a surgical scene at a surgical site such as a surgical facility, operating room, or the like. For instance, scene 104 may include all or part of an operating room in which a surgical procedure may be performed on a patient. In certain implementations, scene 104 includes an area of an operating room proximate to a robotic surgical system that is used to perform a surgical procedure. In certain implementations, scene 104 includes an area within a body of a patient. While certain illustrative examples described herein are directed to scene 104 including a scene at a surgical facility, one or more principles described herein may be applied to other suitable scenes in other implementations.

Imaging devices 102 may include any imaging devices configured to capture images of scene 104. For example, imaging devices 102 may include video imaging devices, infrared imaging devices, visible light imaging devices, non-visible light imaging devices, intensity imaging devices (e.g., color, grayscale, black and white imaging devices), depth imaging devices (e.g., stereoscopic imaging devices, time-of-flight imaging devices, infrared imaging devices, etc.), endoscopic imaging devices, any other imaging devices, or any combination or sub-combination of such imaging devices. Imaging devices 102 may be configured to capture images of scene 104 at any suitable capture rates. Imaging devices 102 may be synchronized in any suitable way for synchronous capture of images of scene 104. The synchronization may include operations of the imaging devices being synchronized and/or data sets output by the imaging devices being synchronized by matching data sets to common points in time.

FIG. 1 illustrates a simple configuration of two imaging devices 102 positioned to capture images of scene 104 from two different viewpoints. This configuration is illustrative. It will be understood that a multi-sensor architecture such as a multi-view architecture may include two or more imaging devices positioned to capture images of scene 104 from two or more different viewpoints. The viewpoint of an imaging device 102 (i.e., the position, orientation, and view settings such as zoom for imaging device 102) determines the content of the images that are captured by imaging device 102. The multi-sensor architecture may further include additional sensors positioned to capture data of scene 104 from additional locations.

System 100 may include a processing system 106 communicatively coupled to imaging devices 102. Processing system 106 may be configured to access imagery captured by imaging devices 102 and determine values of activity visibility metrics for imaging devices 102 as further described herein. Processing system 106 may use the values of the activity visibility metrics to facilitate adjustment of viewpoints of imaging devices 102 and/or to determine activities of scenes of medical sessions (e.g., activity recognition). Such applications for activity visibility metrics are also further described herein.

FIG. 2 illustrates an example configuration of processing system 106 of a multi-view medical activity recognition system (e.g., system 100). Processing system 106 may include, without limitation, a storage facility 202 and a processing facility 204 selectively and communicatively coupled to one another. Facilities 202 and 204 may each include or be implemented by one or more physical computing devices including hardware and/or software components such as processors, memories, storage drives, communication interfaces, instructions stored in memory for execution by the processors, and so forth. Although facilities 202 and 204 are shown to be separate facilities in FIG. 2 , facilities 202 and 204 may be combined into fewer facilities, such as into a single facility, or divided into more facilities as may serve a particular implementation. In some examples, each of facilities 202 and 204 may be distributed between multiple devices and/or multiple locations as may serve a particular implementation.

Storage facility 202 may maintain (e.g., store) executable data used by processing facility 204 to perform any of the functionality described herein. For example, storage facility 202 may store instructions 206 that may be executed by processing facility 204 to perform one or more of the operations described herein. Instructions 206 may be implemented by any suitable application, software, code, and/or other executable data instance. Storage facility 202 may also maintain any data received, generated, managed, used, and/or transmitted by processing facility 204.

Processing facility 204 may be configured to perform (e.g., execute instructions 206 stored in storage facility 202 to perform) various operations associated with activity recognition, such as activity recognition of a scene of a medical session performed by a computer-assisted surgical system.

These and other illustrative operations that may be performed by processing system 106 (e.g., by processing facility 204 of processing system 106) are described herein. In the description that follows, any references to functions performed by processing system 106 may be understood to be performed by processing facility 204 based on instructions 206 stored in storage facility 202.

FIG. 3 illustrates an example configuration of processing system 106. As shown, processing system 106 includes activity visibility modules 302 (e.g., activity visibility module 302-1 and activity visibility module 302-2). Activity visibility modules 302 may be configured to access imagery 304 (e.g., imagery 304-1 and imagery 304-2) captured by imaging devices (e.g., imaging devices 102) of an activity recognition system (e.g., system 100) and determine activity visibility metric values 306 (e.g., activity visibility metric value 306-1 and activity visibility metric value 306-2) based on imagery 304. Processing system 106 further includes an activity classifier 308 that may generate an activity classification 310 based on activity visibility metric values 306 and/or provide an output to facilitate one or more viewpoint adjustments 312 of imaging devices 102.

For example, activity visibility module 302-1 may receive imagery 304-1 from imaging device 102-1. Imagery 304-1 may include and/or be represented by any image data that represents a plurality of images, or one or more aspects of images, captured by imaging device 102-1 of a scene (e.g., scene 104), such as a scene of a medical session. For instance, the plurality of images may be one or more video clips that include a series of images captured over a period of time. The video clips may capture one or more activities being performed in scene 104.

Activities may be any action performed in scene 104 by a person or a system. In some examples, activities may be specific to actions performed associated with the medical session, such as predefined phases of the medical session. For instance, a particular surgical session may include 10-20 (or any other suitable number of) different predefined phases, such as sterile preparation, patient roll in, surgery, etc., that may be a defined set of activities from which system 100 classifies activities of scene 104 as captured in particular video clips.

Activity visibility module 302-1 may access imagery 304-1 (e.g., one or more video clips) in any suitable manner. For instance, activity visibility module 302-1 may receive imagery 304-1 from imaging device 102-1, retrieve imagery 304-1 from imaging device 102-1, receive and/or retrieve imagery 304-1 from a storage device and/or any other suitable device that is communicatively coupled to imaging device 102-1, etc.

Activity visibility module 302-1 may determine activity visibility metric value 306-1 based on the imagery 304-1. Activity visibility metric value 306-1 may include a score or any other metric that represents a rating of how visible an activity of scene 104 is in the imagery. For example, activity visibility metric value 306-1 may be a number between 1 and 5, with 5 representing a highest activity visibility and 1 representing a lowest activity visibility. The number may be implemented as a whole number (i.e. the score may be one of 1, 2, 3, 4, or 5) or any suitable rational number, which may be rounded to one, two, or any suitable number of decimal places. Alternatively, activity visibility metrics may be implemented using any other such suitable range and/or scale.

Activity visibility metric value 306-1 may be determined based on any suitable set of factors. In certain examples, activity visibility metric value 306-1 may be based on a general visibility of imagery 304-1 and/or a specific visibility of the activity in imagery 304-1. General visibility may correspond to how generally visible any content of imagery 304-1 is in imagery 304-1. For instance, general visibility may include factors such as distance from scene 104, noise levels in imagery 304-1 as captured by imaging device 102-1, whether imagery 304-1 is in focus, whether imagery 304-1 is overexposed, etc.

On the other hand, specific visibility of the activity may be based on how visible the activity of scene 104 is in imagery 304-1, which may be separate from the general visibility. For example, two video clips may be similarly generally visible (e.g., a similar distance from scene 104 with a similar clarity of content), but based on the activity of scene 104, the specific visibility of the activity (and as a result, activity visibility metric value 306-1) may be different due to elements important for recognition of the activity being visible in one video clip but not in the other video clip. Example factors that may affect specific visibility of the activity may include whether objects are occluding the activity of scene 104, whether imagery 304-1 captures important elements of the activity (e.g., objects of interest for the activity), etc. Specific visibility of the activity may additionally be affected by the general visibility of imagery 304-1. For instance, specific visibility of the activity may be lower in imagery that has a low general visibility due to all the content (including the activity) of imagery 304-1 being unclear.

Thus, activity visibility module 302-1 may determine activity visibility metric value 306-1 based on both imagery 304-1 (e.g., content of imagery 304-1), which may be reflected in the general visibility (and, in some instances, the specific visibility of the activity) of imagery 304-1, as well as the activity of scene 104, which may affect the specific visibility of the activity in imagery 304-1.

An activity visibility module 302 may determine an activity visibility metric value 306 in any suitable manner. For instance, one or more machine learning algorithms may be used to train a machine learning model that is configured to predict activity visibility metric values 306 based on imagery 304 and the activity of scene 104. Such machine learning algorithms and models are further described herein. Additionally or alternatively, activity visibility module 302 may apply an activity recognition algorithm to imagery 304 to identify an activity of scene 104. The activity recognition algorithm may also generate a confidence measure of the identification of the activity, which may be used to determine an activity visibility metric value 306. Additionally or alternatively, activity visibility module 302 may receive information associated with the activity of scene 104, which may be used to determine an activity visibility metric value 306. The information associated with the activity of the scene 104 may be received from any suitable source(s) (e.g., a robotic surgical system, user input, etc.) and may include any information related to the activity of the scene 104 and/or from which information related to the activity of the scene 104 may be derived.

Activity visibility metric values 306 may be output to activity classifier 308, which may determine activity classification 310 based on activity visibility metric values 306 and imagery 304. For example, activity classifier 308 may determine separate, individual classifications of the activity of scene 104 based on imagery 304-1 and imagery 304-2. Activity classifier 308 may then use activity visibility metric values 306 to weight the individual classifications of the activity, using activity visibility metric value 306-1 as (or in addition to) a confidence measure of the classification of the activity based on imagery 304-1 and activity visibility metric value 306-2 as (or in addition to) a confidence measure of the classification of the activity based on imagery 304-2. Based on the weighted classifications, activity classifier 308 may determine an overall classification and output the overall classification as activity classification 310. Additionally or alternatively, activity classifier 308 may selectively use activity visibility metric values 306 for generating activity classification 310 in instances when individual classifications of the activity differ and disregard activity visibility metric values 306 when the individual classifications of the activity are the same. Additionally or alternatively, activity classifier 308 may use a threshold value of the activity visibility metric to determine whether to use one or more activity visibility metrics. For example, if an activity visibility metric value 306 is below the threshold value of the activity visibility metric, activity classifier 308 may disregard the corresponding imagery 304 and/or lower a weighting of the corresponding imagery 304 for classifying the activity. Additionally or alternatively, activity classifier 308 may use activity visibility metrics in any other suitable manner for determining activity classification 310.

Further, activity classifier 308 may determine and output viewpoint adjustment 312 to facilitate adjusting the viewpoint for one or more of imaging devices 102. For example, if activity visibility metric value 306-1 is below a threshold value of the activity visibility metric, activity classifier 308 may facilitate adjusting imaging device 102-1 to change the viewpoint of imaging device 102-1 so that imaging device 102-1 may capture additional imagery of scene 104 from a different viewpoint. For instance, activity classifier 308 may output viewpoint adjustment 312 that includes instruction to move imaging device 102-1 relative to scene 104 to change the viewpoint of imaging device 102-1 so that imaging device 102-1 captures additional imagery with a higher value of the activity visibility metric than the imagery captured from the initial viewpoint.

The instruction to move imaging device 102-1 may include instruction to physically move imaging device 102-1 relative to scene 104 in any suitable way. For example, imaging device 102-1 may include an articulating imaging device configured to articulate relative to scene 104. In certain examples, imaging device 102-1 may articulate because imaging device 102-1 is attached to an articulating support structure such that when the articulating support structure articulates imaging device 102-1 articulates correspondingly. In certain examples, imaging device 102-1 is mounted to an articulating arm of a robotic system such as a teleoperated robotic arm of the robotic system. In certain examples, imaging device 102-1 is mounted to an articulating support structure in a surgical facility, such as to an articulating imaging device boom, surgical cart, or other structure in the surgical facility. Viewpoint adjustment 312 may include output configured for the structure for imaging device 102-1. For instance, viewpoint adjustment 312 may include output to the robotic system and/or the articulating support structure to instruct the robotic system and/or the articulating support structure to change a pose of imaging device 102-1. Additionally or alternatively, viewpoint adjustment 312 may include output to a user (e.g., on a screen, etc.) to instruct the user to change the pose of imaging device 102-1.

In addition or alternative to imaging device 102-1 physically moving relative to scene 104, imaging device 102-1 may be considered to move relative to scene 104 in one or more other ways. In certain embodiments, for example, a movement of imaging device 102-1 may include any change to a viewpoint of imaging device 102-1. The change to the viewpoint may be caused by any suitable change to one or more parameters of imaging device 102-1. As an example, a change to a zoom parameter changes the viewpoint of imaging device 102-1. As another example, a change to a spatial position and/or orientation of imaging device 102-1 changes the viewpoint of imaging device 102-1. In such examples, viewpoint adjustment 312 may include output to imaging device 102-1 to change one or more parameters of imaging device 102-1. The viewpoints may be dynamically changed during a medical session (e.g., during any phase of the medical session such as during pre-operative activities (e.g., setup activities), intra-operative activities, and/or post-operative activities).

In certain illustrative examples, a multi-sensor architecture may include multiple imaging devices 102 mounted on different components of a robotic surgical system, with one or more of the components configured to articulate relative to an imaged scene and relative to one or more of the other components of the robotic surgical system. For example, imaging device 102-1 may be mounted on an articulating or non-articulating component of the robotic system, and imaging device 102-2 may be mounted on another articulating component of the robotic system.

In certain illustrative examples, one or more imaging devices 102 of a multi-sensor architecture may be mounted on additional or alternative components of a surgical facility such as other components in an operating room. For example, imaging device 102-1 may be mounted on an articulating or non-articulating component of a surgical facility, and imaging device 102-2 may be mounted on another articulating component of the surgical facility. As another example, imaging device 102-1 may be mounted on an articulating component of a robotic system, and imaging device 102-1 may be mounted on an articulating or non-articulating component of the surgical facility.

As processing system 106 may determine activity visibility metric values 306 and provide viewpoint adjustments 312 in real time, processing system 106 may facilitate adjusting imaging devices 102 such that imaging devices 102 are continually providing imagery 304 that is optimized for each activity of scene 104. In some examples, viewpoint adjustment 312 may include specific guidance and/or direction as to where or how to move imaging device 102 to improve activity visibility.

In certain examples, activity classifier 308 may include a generative model 314 that may be used to produce generated imagery. The generated imagery may be based on imagery 304 captured by imaging devices 102. As an example, another imaging device other than imaging devices 102-1 and 102-2 (not shown) may be capturing imagery with an activity visibility metric value that indicates that a current viewpoint of the other imaging device is suboptimal for activity visibility. Activity classifier 308 may use generative model 314 to produce generated imagery that is based on imagery 304-1 and imagery 304-2. Using imagery 304-1 and imagery 304-2, the generated imagery may interpolate, model, and/or predict how scene 104 may look from other viewpoints (e.g., viewpoints in between current viewpoints of imaging devices 102-1 and 102-2). Based on the generated imagery, activity classifier 308 may determine a generated value of the activity visibility metric for the generated imagery. Activity classifier 308 may then select a viewpoint with a more optimal generated visibility metric and facilitate adjusting the other imaging device to change a pose of the other imaging device so that the other imaging device may capture imagery of scene 104 from the selected viewpoint.

Additionally or alternatively, activity classifier 308 may use generative model 314 to generate imagery from a current viewpoint of an imaging device. For example, in certain embodiments, an imaging device (e.g., imaging device 102-1) may be fixed at a specific location. Activity classifier 308 may determine that activity visibility metric value 306-1 for imaging device 102-1 capturing imagery 304-1 from a viewpoint of the fixed location is below a threshold value of the activity visibility metric. In response, activity classifier 308 may use generative model 314 to produce generated imagery that is from a perspective of the viewpoint of the fixed location but is based on imagery from other imaging devices (e.g., imaging device 102-2, additional imaging devices not shown). Using the imagery from the other imaging devices, which may have higher activity visibility metric values (e.g., at least the threshold value of the activity visibility metric and/or higher than activity visibility metric value 306-1), the generated imagery may be used to supplement and/or replace imagery 304-1. For instance, imaging device 102-1 may have a low value of the activity visibility metric due to an object obstructing the view of scene 104. The generated imagery produced by generative model 314 may reconstruct objects and/or environments of scene 104 using imagery from other viewpoints, as viewed from the viewpoint from imaging device 102-1. The generated imagery and/or imagery 304-1 supplemented with the generated imagery may be provided as output to a user and/or for further processing by system 100.

Generative model 314 may be configured to produce generated imagery in any suitable manner. For example, generative model 314 may use one or more machine learning algorithms trained to generate imagery, such as a generative adversarial network (GAN), etc. Additionally or alternatively, generative model 314 may interpolate imagery captured by other imaging devices from different viewpoints, generate models based on imagery captured by other imaging devices, and generate imagery based on the models, etc.

In some examples, activity classifier 308 may determine an overall value of the activity visibility metric that represents a visibility of the activity of scene 104 based on imagery captured by a plurality of imaging devices (e.g., all imaging devices 102 of a multi-view architecture). Activity classifier 308 may also base viewpoint adjustment 312 on the overall value of the activity visibility metric. As a result, in some instances, activity classifier 308 may facilitate adjustment of a particular imaging device to a new viewpoint that may result in a lower value of the activity visibility metric for the particular imaging device, but a higher overall value of the activity visibility metric.

FIG. 4 illustrates an example configuration 400 of a machine learning model 402 (“model 402”) for a multi-view medical activity recognition system (e.g., “system 100”). While configuration 400 shows a training of model 402, which may be used by system 100 for determining activity visibility metric values and classifying activities, system 100 may additionally or alternatively use any suitable machine learning model trained in any suitable manner.

Configuration 400 shows model 402 accessing imagery 404 (e.g., imagery 404-1 through 404-N). Imagery 404 may be in the form of video clips, each including a time-sequenced series of images, that are captured by an imaging device (e.g., imaging device 102-1). Each video clip may include any suitable number (e.g., 16, 32, etc.) of frames (e.g., images).

Model 402 uses activity recognition algorithms 406 (e.g., activity recognition algorithms 406-1 through 406-N) to extract features of respective video clips to determine an activity of the scene captured in the video clips. Activity recognition algorithms 406 may be implemented by any suitable algorithm or algorithms, such as a fine-tuned 13D model or any other neural network or other algorithm.

Activity recognition algorithms 406 each provide an output to a classifier 408 that is configured to receive the output of activity recognition algorithms 406 for a plurality of video clips of imagery 404. Thus, classifier 408 uses features extracted from a plurality of video clips to identify activities in each of the video clips. In some examples, configuration 400 may use classifier 408 for training model 402 but not rely on or include classifier 408 during implementation of model 402, allowing model 402 to identify activities in independent video clips in real time.

Classifier 408 may output a first classification of each video clip to respective long short-term memory (LSTM) algorithms 410 (e.g., LSTM algorithms 410-1 through 410-N). LSTM algorithms 410 may each be configured to process respective video clips, while also communicating with other LSTM algorithms 410 (e.g., an LSTM algorithm for a preceding video clip and a subsequent video clip). Each LSTM algorithm 410 may process a video clip to also extract features for activity recognition for the scene captured by the video clip. LSTM algorithms 410 may output the features to classifiers 412 (e.g., classifier 412-1 through 412-N).

Classifiers 412 may receive features extracted by LSTM algorithms 410 to identify the activity captured in the corresponding video clip. Classifiers 412 may also receive the first classification of the video clip generated by classifier 408 and base its classification of the video clip at least in part on the first classification. Based on the features extracted by LSTM algorithm 410 and the first classification generated by classifier 408, classifiers 412 may output final classifications of the activity captured by respective video clips.

Final classifications may be a selection of one or more predefined activities 414 that are associated with a medical session captured by imagery 404. For example, the final classification may be a one-dimensional vector with a length that corresponds to the number of activities predefined for the medical session. Each vector may have values that correspond to probabilities for each of the activities as identified in the respective video clip.

The final classifications may be provided to two layers of regression algorithms 416 and 418 (e.g., regression algorithms 416-1 through 416-N and regression algorithms 418-1 through 418-N). Regression algorithms 416 and 418 may be configured to generate an activity visibility score based on the respective video clip. In some examples, the activity visibility score may be further based on the final classification of the respective video clip determined by classifiers 412. In other examples, the activity visibility score may be determined independent of the final classification.

Model 402 may be trained in any suitable manner. For instance, model 402 may be trained end to end using a supervised learning algorithm. Training data sets may include video clips that are labeled with the activity of the scene as captured in the video clip, as well as a value of an activity visibility metric of the activity based on the imagery of the video clip. The activity identification labels may be verified by a user and/or other synchronized imagery from other imaging devices. The activity visibility metric value labels may be provided by users. Based on such labeled data sets, model 402 may learn to receive imagery as inputs and predict activity visibility metrics based on the imagery and the activity, as well as predict activity classifications. The prediction of the value of the activity visibility metric may be based on the activity classification prediction and/or generated independently of the activity classification prediction.

FIG. 5 shows an example computer-assisted robotic surgical system 500 (“surgical system 500”) associated with system 100. System 100 may be implemented by surgical system 500, connected to surgical system 500, and/or otherwise used in conjunction with surgical system 500. For example, system 100 may be implemented by one or more components of surgical system 500 such as a manipulating system, a user control system, or an auxiliary system. As another example, system 100 may be implemented by a stand-alone computing system communicatively coupled to a computer-assisted surgical system.

As shown, surgical system 500 may include a manipulating system 502, a user control system 504, and an auxiliary system 506 communicatively coupled one to another. Surgical system 500 may be utilized by a surgical team to perform a computer-assisted surgical procedure on a patient 508. As shown, the surgical team may include a surgeon 510-1, an assistant 510-2, a nurse 510-3, and an anesthesiologist 510-4, all of whom may be collectively referred to as “surgical team members 510.” Additional or alternative surgical team members may be present during a surgical session.

While FIG. 5 illustrates an ongoing minimally invasive surgical procedure, it will be understood that surgical system 500 may similarly be used to perform open surgical procedures or other types of surgical procedures that may similarly benefit from the accuracy and convenience of surgical system 500. Additionally, it will be understood that a medical session such as a surgical session throughout which surgical system 500 may be employed may not only include an operative phase of a surgical procedure, as is illustrated in FIG. 5 , but may also include preoperative (which may include setup of surgical system 500), postoperative, and/or other suitable phases of the surgical session.

As shown in FIG. 5 , manipulating system 502 may include a plurality of manipulator arms 512 (e.g., manipulator arms 512-1 through 512-4) to which a plurality of surgical instruments may be coupled. Each surgical instrument may be implemented by any suitable surgical tool (e.g., a tool having tissue-interaction functions), medical tool, imaging device (e.g., an endoscope, an ultrasound tool, etc.), sensing instrument (e.g., a force-sensing surgical instrument), diagnostic instrument, or the like that may be used for a computer-assisted surgical procedure on patient 508 (e.g., by being at least partially inserted into patient 508 and manipulated to perform a computer-assisted surgical procedure on patient 508). While manipulating system 502 is depicted and described herein as including four manipulator arms 512, it will be recognized that manipulating system 502 may include only a single manipulator arm 512 or any other number of manipulator arms as may serve a particular implementation.

Manipulator arms 512 and/or surgical instruments attached to manipulator arms 512 may include one or more displacement transducers, orientational sensors, and/or positional sensors used to generate raw (i.e., uncorrected) kinematics information. One or more components of surgical system 500 may be configured to use the kinematics information to track (e.g., determine poses of) and/or control the surgical instruments, as well as anything connected to the instruments and/or arms. As described herein, system 100 may use the kinematics information to track components of surgical system 500 (e.g., manipulator arms 512 and/or surgical instruments attached to manipulator arms 512).

User control system 504 may be configured to facilitate control by surgeon 510-1 of manipulator arms 512 and surgical instruments attached to manipulator arms 512. For example, surgeon 510-1 may interact with user control system 504 to remotely move or manipulate manipulator arms 512 and the surgical instruments. To this end, user control system 504 may provide surgeon 510-1 with imagery (e.g., high-definition 3D imagery) of a surgical site associated with patient 508 as captured by an imaging system (e.g., an endoscope). In certain examples, user control system 504 may include a stereo viewer having two displays where stereoscopic images of a surgical site associated with patient 508 and generated by a stereoscopic imaging system may be viewed by surgeon 510-1. Surgeon 510-1 may utilize the imagery displayed by user control system 504 to perform one or more procedures with one or more surgical instruments attached to manipulator arms 512.

To facilitate control of surgical instruments, user control system 504 may include a set of master controls. These master controls may be manipulated by surgeon 510-1 to control movement of surgical instruments (e.g., by utilizing robotic and/or teleoperation technology). The master controls may be configured to detect a wide variety of hand, wrist, and finger movements by surgeon 510-1. In this manner, surgeon 510-1 may intuitively perform a procedure using one or more surgical instruments.

Auxiliary system 506 may include one or more computing devices configured to perform processing operations of surgical system 500. In such configurations, the one or more computing devices included in auxiliary system 506 may control and/or coordinate operations performed by various other components (e.g., manipulating system 502 and user control system 504) of surgical system 500. For example, a computing device included in user control system 504 may transmit instructions to manipulating system 502 by way of the one or more computing devices included in auxiliary system 506. As another example, auxiliary system 506 may receive and process image data representative of imagery captured by one or more imaging devices attached to manipulating system 502.

In some examples, auxiliary system 506 may be configured to present visual content to surgical team members 510 who may not have access to the images provided to surgeon 510-1 at user control system 504. To this end, auxiliary system 506 may include a display monitor 514 configured to display one or more user interfaces, such as images of the surgical site, information associated with patient 508 and/or the surgical procedure, and/or any other visual content as may serve a particular implementation. For example, display monitor 514 may display images of the surgical site together with additional content (e.g., graphical content, contextual information, etc.) concurrently displayed with the images. In some embodiments, display monitor 514 is implemented by a touchscreen display with which surgical team members 510 may interact (e.g., by way of touch gestures) to provide user input to surgical system 500.

Manipulating system 502, user control system 504, and auxiliary system 506 may be communicatively coupled one to another in any suitable manner. For example, as shown in FIG. 5 , manipulating system 502, user control system 504, and auxiliary system 506 may be communicatively coupled by way of control lines 516, which may represent any wired or wireless communication link as may serve a particular implementation. To this end, manipulating system 502, user control system 504, and auxiliary system 506 may each include one or more wired or wireless communication interfaces, such as one or more local area network interfaces, Wi-Fi network interfaces, cellular interfaces, etc.

In certain examples, imaging devices such as imaging devices 102 may be attached to components of surgical system 500 and/or components of a surgical facility where surgical system 500 is set up. For example, imaging devices may be attached to components of manipulating system 502.

FIG. 6 depicts an illustrative configuration 600 of imaging devices 102 (imaging devices 102-1 through 102-4) attached to components of manipulating system 502. As shown, imaging device 102-1 may be attached to an orienting platform (OP) 602 of manipulating system 502, imaging device 102-2 may be attached to manipulating arm 512-1 of manipulating system 502, imaging device 102-3 may be attached to manipulating arm 512-4 of manipulating system 502, and imaging device 102-4 may be attached to a base 604 of manipulating system 502. Imaging device 120-1 attached to OP 602 may be referred to as OP imaging device, imaging device 120-2 attached to manipulating arm 512-1 may be referred to as universal setup manipulator 1 (USM1) imaging device, imaging device 120-3 attached to manipulating arm 512-4 may be referred to as universal setup manipulator 4 (USM4) imaging device, and imaging device 120-4 attached to base 604 may be referred to as BASE imaging device. In implementations in which manipulating system 502 is positioned proximate to a patient (e.g., as a patient side cart), placement of imaging devices 502 at strategic locations on manipulating system 502 provides advantageous imaging viewpoints proximate to a patient and a surgical procedure performed on the patient.

In certain implementations, components of manipulating system 502 (or other robotic systems in other examples) may have redundant degrees of freedom that allow multiple configurations of the components to arrive at the same output position of an end effector attached to the components (e.g., an instrument connected to a manipulator arm 512). Accordingly, processing system 106 may direct components of manipulating system 502 to move without affecting the position of an end effector attached to the components. This may allow for repositioning of components to be performed for activity recognition without changing the position of an end effector attached to the components.

The illustrated placements of imaging devices 102 to components of manipulating system 502 are illustrative. Additional and/or alternative placements of any suitable number of imaging devices 102 on manipulating system 502, other components of surgical system 500, and/or other components at a surgical facility may be used in other implementations. Imaging devices 102 may be attached to components of manipulating system 502, other components of surgical system 500, and/or other components at a surgical facility in any suitable way.

FIG. 7 illustrates an example method 700 of a multi-view medical activity recognition system. While FIG. 7 illustrates example operations according to one embodiment, other embodiments may omit, add to, reorder, combine, and/or modify any of the operations shown in FIG. 7 . One or more of the operations shown in in FIG. 7 may be performed by an activity recognition system such as system 100, any components included therein, and/or any implementation thereof.

In operation 702, an activity recognition system may access imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints. Operation 702 may be performed in any of the ways described herein.

In operation 704, the activity recognition system may determine, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor. Operation 704 may be performed in any of the ways described herein.

In operation 706, the activity recognition system may facilitate, based on the value of the activity visibility metric, adjusting the first viewpoint of the first sensor. Operation 706 may be performed in any of the ways described herein.

FIG. 8 illustrates an example method 800 of a multi-view medical activity recognition system. While FIG. 8 illustrates example operations according to one embodiment, other embodiments may omit, add to, reorder, combine, and/or modify any of the operations shown in FIG. 8 . One or more of the operations shown in in FIG. 8 may be performed by an activity recognition system such as system 100, any components included therein, and/or any implementation thereof.

In operation 802, an activity recognition system may access imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints. Operation 802 may be performed in any of the ways described herein.

In operation 804, the activity recognition system may determine, based on the first imagery, a first classification of an activity of the scene. Operation 804 may be performed in any of the ways described herein.

In operation 806, the activity recognition system may determine, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor. Operation 806 may be performed in any of the ways described herein.

In operation 808, the activity recognition system may determine that the value of the activity visibility metric for the first sensor is below a threshold value of the activity visibility metric. Operation 808 may be performed in any of the ways described herein.

In operation 810, the activity recognition system may lower a weighting of the first classification of the activity of the scene for determining an overall classification of the activity of the scene based on the imagery of the scene. Operation 810 may be performed in any of the ways described herein.

Multi-view medical activity recognition principles, systems, and methods described herein may be used in various applications. As an example, one or more of the activity recognition aspects described herein may be used for surgical workflow analysis in real time or retrospectively. As another example, one or more of the activity recognition aspects described herein may be used for automated transcription of a surgical session (e.g., for purposes of documentation, further planning, and/or resource allocation). As another example, one or more of the activity recognition aspects described herein may be used for automation of surgical sub-tasks. As another example, one or more of the activity recognition aspects described herein may be used for computer-assisted setup of a surgical system and/or a surgical facility (e.g., one or more operations to set up a robotic surgical system may be automated based on perception of a surgical scene and automated movement of the robotic surgical system). These examples of applications of activity recognition principles, systems, and methods described herein are illustrative. Activity recognition principles, systems, and methods described herein may be implemented for other suitable applications.

In some examples, a non-transitory computer-readable medium storing computer-readable instructions may be provided in accordance with the principles described herein. The instructions, when executed by a processor of a computing device, may direct the processor and/or computing device to perform one or more operations, including one or more of the operations described herein. Such instructions may be stored and/or transmitted using any of a variety of known computer-readable media.

A non-transitory computer-readable medium as referred to herein may include any non-transitory storage medium that participates in providing data (e.g., instructions) that may be read and/or executed by a computing device (e.g., by a processor of a computing device). For example, a non-transitory computer-readable medium may include, but is not limited to, any combination of non-volatile storage media and/or volatile storage media. Illustrative non-volatile storage media include, but are not limited to, read-only memory, flash memory, a solid-state drive, a magnetic storage device (e.g. a hard disk, a floppy disk, magnetic tape, etc.), ferroelectric random-access memory (“RAM”), and an optical disc (e.g., a compact disc, a digital video disc, a Blu-ray disc, etc.). Illustrative volatile storage media include, but are not limited to, RAM (e.g., dynamic RAM).

FIG. 9 illustrates an example computing device 900 that may be specifically configured to perform one or more of the processes described herein. Any of the systems, units, computing devices, and/or other components described herein may implement or be implemented by computing device 900.

As shown in FIG. 9 , computing device 900 may include a communication interface 902, a processor 904, a storage device 906, and an input/output (“I/O”) module 908 communicatively connected one to another via a communication infrastructure 910. While an example computing device 900 is shown in FIG. 9 , the components illustrated in FIG. 9 are not intended to be limiting. Additional or alternative components may be used in other embodiments. Components of computing device 900 shown in FIG. 9 will now be described in additional detail.

Communication interface 902 may be configured to communicate with one or more computing devices. Examples of communication interface 902 include, without limitation, a wired network interface (such as a network interface card), a wireless network interface (such as a wireless network interface card), a modem, an audio/video connection, and any other suitable interface.

Processor 904 generally represents any type or form of processing unit capable of processing data and/or interpreting, executing, and/or directing execution of one or more of the instructions, processes, and/or operations described herein. Processor 904 may perform operations by executing computer-executable instructions 912 (e.g., an application, software, code, and/or other executable data instance) stored in storage device 906.

Storage device 906 may include one or more data storage media, devices, or configurations and may employ any type, form, and combination of data storage media and/or device. For example, storage device 906 may include, but is not limited to, any combination of the non-volatile media and/or volatile media described herein. Electronic data, including data described herein, may be temporarily and/or permanently stored in storage device 906. For example, data representative of computer-executable instructions 912 configured to direct processor 904 to perform any of the operations described herein may be stored within storage device 906. In some examples, data may be arranged in one or more databases residing within storage device 906.

I/O module 908 may include one or more I/O modules configured to receive user input and provide user output. I/O module 908 may include any hardware, firmware, software, or combination thereof supportive of input and output capabilities. For example, I/O module 908 may include hardware and/or software for capturing user input, including, but not limited to, a keyboard or keypad, a touchscreen component (e.g., touchscreen display), a receiver (e.g., an RF or infrared receiver), motion sensors, and/or one or more input buttons.

I/O module 908 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., display drivers), one or more audio speakers, and one or more audio drivers. In certain embodiments, I/O module 908 is configured to provide graphical data to a display for presentation to a user. The graphical data may be representative of one or more graphical user interfaces and/or any other graphical content as may serve a particular implementation.

In some examples, any of the systems, modules, and/or facilities described herein may be implemented by or within one or more components of computing device 900. For example, one or more applications 912 residing within storage device 906 may be configured to direct an implementation of processor 904 to perform one or more operations or functions associated with processing system 108 of system 100.

As mentioned, one or more operations described herein may be performed during a medical session, e.g., dynamically, in real time, and/or in near real time. As used herein, operations that are described as occurring “in real time” will be understood to be performed immediately and without undue delay, even if it is not possible for there to be absolutely zero delay.

Any of the systems, devices, and/or components thereof may be implemented in any suitable combination or sub-combination. For example, any of the systems, devices, and/or components thereof may be implemented as an apparatus configured to perform one or more of the operations described herein.

In the description herein, various illustrative embodiments have been described. It will, however, be evident that various modifications and changes may be made thereto, and additional embodiments may be implemented, without departing from the scope of the invention as set forth in the claims that follow. For example, certain features of one embodiment described herein may be combined with or substituted for features of another embodiment described herein. The description and drawings are accordingly to be regarded in an illustrative rather than a restrictive sense. 

1-32. (canceled)
 33. A system comprising: a memory storing instructions; a processor communicatively coupled to the memory and configured to execute the instructions to: access imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determine, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor; and facilitate, based on the value of the activity visibility metric, adjusting the first viewpoint of the first sensor.
 34. The system of claim 33, wherein the facilitating adjusting the first viewpoint of the first sensor comprises providing an output to a robotic system to instruct the robotic system to change a pose of the first sensor.
 35. The system of claim 33, wherein the facilitating adjusting the first viewpoint of the first sensor comprises providing an output to a user to instruct the user to change a pose of the first sensor.
 36. The system of claim 33, wherein: the instructions comprise a machine learning model trained based on training imagery labeled with an activity of scenes captured in the training imagery; and the determining the value of the activity visibility metric for the first sensor comprises using the machine learning model.
 37. The system of claim 33, wherein the processor is further configured to execute the instructions to: access additional imagery of the scene of the medical session captured by the plurality of sensors from another plurality of viewpoints, the additional imagery including second imagery captured by the first sensor from a second viewpoint different from the first viewpoint; and determine, based on the additional imagery, an additional value of the activity visibility metric that is higher than the value of the activity visibility metric.
 38. The system of claim 33, wherein: the imagery of the scene includes: second imagery captured by a second sensor of the plurality of sensors from a second viewpoint of the plurality of viewpoints, and third imagery captured by a third sensor of the plurality of sensors from a third viewpoint of the plurality of viewpoints; and the processor is further configured to execute the instructions to: determine that the value of the activity visibility metric for the first sensor is below a threshold value of the activity visibility metric, and use, based on the determining that the value of the activity visibility metric for the first sensor is below the threshold value of the activity visibility metric, a generative model to produce generated imagery based on the second imagery and the third imagery.
 39. The system of claim 38, wherein the generated imagery comprises imagery generated based on the first viewpoint, the generated imagery having a generated value of the activity visibility metric that is higher than the value of the activity visibility metric for the first sensor.
 40. The system of claim 38, wherein: the generated imagery comprises imagery generated based on a fourth viewpoint, the generated imagery having a generated value of the activity visibility metric that is higher than the value of the activity visibility metric for the first sensor; and the facilitating adjusting the first viewpoint of the first sensor comprises providing an output comprising an instruction to change a pose of the first sensor to capture additional imagery of the scene from the fourth viewpoint.
 41. The system of claim 38, wherein: the processor is further configured to execute the instructions to: determine, based on the second imagery, a value of the activity visibility metric for the second sensor, determine, based on the third imagery, a value of the activity visibility metric for the third sensor; and the using the generative model to produce generated imagery based on the second imagery and the third imagery is further based on the values of the activity visibility metric for the second sensor and the third sensor being at least the threshold value of the activity visibility metric.
 42. The system of claim 33, wherein: the imagery of the scene includes second imagery captured by a second sensor of the plurality of sensors from a second viewpoint of the plurality of viewpoints; the processor is further configured to execute the instructions to determine, based on the first imagery and the second imagery, an overall value of the activity visibility metric for the plurality of sensors; and the facilitating adjusting the first viewpoint of the first sensor comprises adjusting the first viewpoint to improve the overall value of the activity visibility metric.
 43. The system of claim 42, wherein the facilitating adjusting the first viewpoint results in a lower value of the activity visibility metric for the first sensor and a higher overall value of the activity visibility metric for the plurality of sensors.
 44. The system of claim 33, wherein the value of the activity visibility metric represents a rating of how visible an activity of the scene is in the first imagery.
 45. A system comprising: a memory storing instructions; a processor communicatively coupled to the memory and configured to execute the instructions to: access imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determine, based on the first imagery, a first classification of an activity of the scene; determine, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor; determine that the value of the activity visibility metric for the first sensor is below a threshold value of the activity visibility metric; and lower, based on the determining that the value of the activity visibility metric for the first sensor is below the threshold value of the activity visibility metric, a weighting of the first classification of the activity of the scene for determining an overall classification of the activity of the scene based on the imagery of the scene.
 46. The system of claim 45, wherein the processor is further configured to execute the instructions to facilitate, based on the determining that the value of the activity visibility metric for the first sensor is below the threshold value of the activity visibility metric, adjusting the first viewpoint of the first sensor.
 47. A method comprising: accessing, by a processor, imagery of a scene of a medical session captured by a plurality of sensors from a plurality of viewpoints, the imagery including first imagery captured by a first sensor of the plurality of sensors from a first viewpoint of the plurality of viewpoints; determining, by the processor, during the medical session and based on the first imagery, a value of an activity visibility metric for the first sensor; and facilitating, by the processor, based on the value of the activity visibility metric, adjusting the first viewpoint of the first sensor.
 48. The method of claim 47, wherein the facilitating adjusting the first viewpoint of the first sensor comprises providing an output to a robotic system to instruct the robotic system to change a pose of the first sensor.
 49. The method of claim 47, wherein the facilitating adjusting the first viewpoint of the first sensor comprises providing an output to a user to instruct the user to change a pose of the first sensor.
 50. The method of claim 47, wherein the determining the value of the activity visibility metric for the first sensor comprises using a machine learning model trained based on training imagery labeled with an activity of scenes captured in the training imagery.
 51. The method of claim 47, further comprising: accessing, by the processor, additional imagery of the scene of the medical session captured by the plurality of sensors from another plurality of viewpoints, the additional imagery including second imagery captured by the first sensor from a second viewpoint different from the first viewpoint; and determining, by the processor, based on the additional imagery, an additional value of the activity visibility metric that is higher than the value of the activity visibility metric.
 52. The method of claim 47, wherein: the imagery of the scene includes: second imagery captured by a second sensor of the plurality of sensors from a second viewpoint of the plurality of viewpoints, and third imagery captured by a third sensor of the plurality of sensors from a third viewpoint of the plurality of viewpoints; and the method further comprises: determining, by the processor, that the value of the activity visibility metric for the first sensor is below a threshold value of the activity visibility metric, and using, by the processor, based on the determining that the value of the activity visibility metric for the first sensor is below the threshold value of the activity visibility metric, a generative model to produce generated imagery based on the second imagery and the third imagery. 